Translation of serial recursive codes to parallel SIMD codes
نویسنده
چکیده
Parallelizing compilers are an important way for making parallel systems easy to use and more acceptable to the mainstream of computing. Several approaches to parallelization have been taken, such as loop parallelization and dependence-graph based parallelization. This paper will undertake another approach: the exploitation of recursion in serial recursive coders to translate the latter to parallel nonrecursive SIMD codes. The principal idea is to perform first a formal data-dependence data-flow analysis on both the input partitioning process and the subsolution-merge process to determine the computation and communication of each partition and each merge, then develop a system of equations whereby we derive the sequences of computation instructions and communication instructions along with their processors-enabling/disabling masks for the SIMD target code. We develop the approach, give a code translation algorithm, and apply it to a sample of common recursive algorithms to illustrate its power. The approach is very effective and efficient, especially for recursive algorithms that are balanced and whose input can be padded to an appropriate size without affecting the desired output. For some recursive algorithms that do not fit this category, it will be argued that a top-down MIMD execution of these algorithms is preferable to SIMD execution. Finally, we note that our approach is generalizable to interleaved recursion.
منابع مشابه
Improved upper bounds on the ML decoding error probability of parallel and serial concatenated turbo codes via their ensemble distance spectrum
The ensemble performance of parallel and serial concatenated turbo codes is considered, where the ensemble is generated by a uniform choice of the interleaver and of the component codes taken from the set of time varying recursive systematic convolutional codes. Following the derivation of the input-output weight enumeration functions of the ensembles of random parallel and serial concatenated ...
متن کاملA Gallager-Tanner construction based on convolutional codes
Generalized low density codes are built by applying a Tanner-like construction to binary recursive systematic convolutional codes. The Gallager-Tanner construction is restricted to 2 levels only. We describe the structure of a GLD code and show how to compute its ensemble performance. We also prove that RSC based GLD codes are asymptotically good. A parity-check interpretation of turbo codes is...
متن کاملParallel Dual Tree Traversal on Multi-core and Many-core Architectures for Astrophysical N-body Simulations
In astrophysical N -body simulations, Dehnen’s algorithm, implemented in the serial falcON code and based on a dual tree traversal, is faster than serial Barnes-Hut tree-codes, but outperformed by parallel CPU and GPU tree-codes. In this paper, we present a parallel dual tree traversal, implemented in the pfalcON code, targeting multi-core CPUs and manycore architectures (Xeon Phi). We focus he...
متن کاملOptimization Strategies for WRF Single-Moment 6-Class Microphysics Scheme (WSM6) on Intel Microarchitectures
Optimizations in the petascale era require modifications of existing codes to take advantage of new architectures with large core counts and SIMD vector units. This paper examines high-level and low-level optimization strategies for numerical weather prediction (NWP) codes. These strategies employ thread-local structures of arrays (SOA) and an OpenMP directive such as OMP SIMD. These optimizati...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کامل